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A general method for deriving maximally informative sigmoidal tuning curves for neural systems 
with small normalized variability is presented. The optimal tuning curve is a nonlinear function of 
the cumulative distribution function of the stimulus and depends on the mean-variance relationship 
of the neural system. The derivation is based on a known relationship between Shannon's mutual 
information and Fisher information, and the optimality of Jeffrey's prior. It relies on the existence 
00 ' of closed-form solutions to the converse problem of optimizing the stimulus distribution for a given 

' tuning curve. It is shown that maximum mutual information corresponds to constant Fisher in- 

, formation only if the stimulus is uniformly distributed. As an example, the case of sub-Poisson 

■ binomial firing statistics is analyzed in detail. 

Stimuli transduced by biological sensory systems are communicated to the brain by short duration electrical pulses 
known as action potentials These 'spikes' are generated by synaptic transmission from receptor cells, and 

propagate to the brain along nerve fibers. 

The derivation in this paper applies to rate coding neurons or neural populations. Although the results may be 
relevant for cortical neurons, they are more likely to be useful for sensory neuronal populations whose function is to 
code a random and continuously varying stimulus parameter, and where the variability between neurons is largely 
O ! uncorrelated, e.g. fibres of the cochlear nerve 0]. 

In rate coding neurons individual action potential timings are not important, and information is coded by mean 
firing rate [H, 0]j i-e. the average number of action potentials observed while a stimulus x is constant for some duration 
t. Experimentally, if firing rate measurements are obtained for a range of stimulus intensities, an average tuning curve 
(also variously known as the stimulus-response curve, gain function or rate-level function) can be plotted as a function 
of the stimulus intensity [E 0) 0| • 
^ There is usually natural variability in the firing rate for a fixed stimulus, which often is called noise [l|, Although 

this variability has led to many previous Shannon information theoretic Q studies of neurons and population of 
neurons, e.g. 0, S, 0, [13, [lH, results for the tuning curve that maximizes information transfer for a rate-coding 
neuron appear less frequently [1, |3, IH, 13 1. Furthermore, such studies usually focus on neurons that have a so-called 
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in 



preferred stimulus, and a uniniodal tuning curve. 
■ In contrast, optimality conditions for sigmoidal tuning curves where firing rates increase monotonically with stimulus 
OO ! intensity, as in Eq. (jlip and Figs 2(b) and 3(a)[ b elow, have received little attention The work of 14, [l^ is a 
• notable exception. The results here differ from |14l.]l5l| , in that we maximize mutual information for sigmoidal tuning 
^ [ curves, rather than optimizing Fisher information [6[. Furthermore, our results are far more general than the Poisson 



r ■ assumption of [ij, |15| , as they apply for Fano- factors other than unity. 

rS ' Although noisy rate coding neurons are often modeled as a Poisson point process [l[ , in some cases the measured 
' stimulus-dependent variance can be less than the mean (sub-Poisson) or larger than it (super-Poisson) For 
example, while the variance typically might be a ppr oximately Poisson for firing rates close to zero, it can decrease 



if the firing rate saturates, due to refractoriness 17|. This can lead, for example, to binomial spiking rather than 
Poisson spiking [l8| . where the variance is a quadratic function of the mean (as shown in Fig. [U and given below in 
Eq. (O), or even the 'scalloped' minimum variance curve [l^. 

We present our results in terms of normalized conditional mean firing rate, T{x) € [0, 1] and normalized variance, 
V{x). Here we consider only monotonically increasing (sigmoidal) tuning curves, so that the derivative of T{x) with 
respect to stimulus x is strictly nonnegative. Assuming a maximum of N spikes can be produced while a stimulus 
is unchanged — determined, for example, by refractory times and signal correlation times, or the number of parallel 
neurons — normalization reduces the mean by a factor of N, while the variance is reduced by a factor of iV^. Hence 
a plot of variance against mean for a Poisson system is a straight line with slope Normalized sub-Poisson (i.e. 

Fano factor smaller than unity) mean- variance curves fall below this line (see Fig. [1]) , and super-Poisson (Fano factor 
larger than unity) above it. 

Our aim is to find optimal tuning curves for the class of sigmoidal neurons or populations where the normalized 
variance can be expressed as V{x) = s'^h{T(x)), where h{-) is an arbitrary function that describes how the variance 
changes with the mean. The parameter acts to scale the maximum normalized variability, and is typically inversely 
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proportional to N, i.e. related to the integration time in an individual neuron, or the number of neurons in a 



population, as in [13|, [15|. Our results hold exactly only in the small s limit, meaning that the integration time or 
number of neurons must be sufficiently large. Otherwise the actual mutual information is closely lower bounded by 
that of the s — > case. Conditional independence in the variability across a population is also assumed, such as that 
of the cochlear nerve Q . 

Our derivation builds on previous work on the mutual information in neural systems where the instantaneous 
normalized firing rate y in response to stimulus value x can be described as jl3| 



yix) = Tix) + Vnx)^. (1) 

In Eq. ([T]), T{x) and V{x) ~ s'^h{T{x)) are deterministic functions of the stimulus, and ^ is an arbitrary random 
variable with zero mean and unit variance. For the special case where ^ is Gaussian, the result is a conditionally 
Gaussian channel, recently of much interest in optical and wireless communications [20j . 

Under regularity conditions on ^, JJi] showed that the Fisher information [^, |2l[ about a specific stimulus value, 
X, in an observation, y, for s sufficiently small is 

J{x)^^^^k,, (2) 

while the Shannon mutual information Q between the random stimulus and the firing rate is 

I{x,y)^H{x)-]^jj.,{x)\og^ (^^^dx + kl (3) 

In the above Eqns, H{x) is the differential entropy of the stimulus, fx{x) is its probability density function (PDF), 
and and fc^ are constants that depend entirely on the PDF of ^. If ^ is Gaussian then fc^ = I and fc| = [13j . 
More general derivations of Eg. appear in [tI, d, 22 1. 



As discussed in 0, H, 13, 23 1, the PDF of the stimulus that maximizes the mutual information of Eq. ([3]) is 
proportional to the square root of the Fisher information. Such a PDF is called Jeffrey's prior 22], which here we 
denote as f,j{x). Upon letting kj = ^ J((/))d0, the optimal stimulus PDF for Eqs |l])-(l3|) is therefore 



dT{x) 



kj skj ^h{T{x)) 

What has not previously been recognized is that optimizing Eq. ([3]) can lead to general closed form expressions 
for the optimal sigmoidal tuning curve, for arbitrary stimulus distributions and non-Poisson variability. This result 
requires that closed form expressions for the cumulative distribution function (CDF) of the optimal stimulus exist. 
Using Eqs Q and Q, this CDF is 

F^xi^) = / f°x{m = ^°T^^^^"°'''^^ (5) 
xy J j^Jxy^j ^ J^hie)-o-5d9 

which is independent of s and ^. If Eq. (O can be inverted to isolate T{x) on one side of the equation, the resulting 
expression also maximizes the mutual information, and is the optimal tuning curve for a given stimulus, T°[x). 

We note that while previous work has discussed the optimal tuning curve for two simple relationships between T[x) 
and V{x), i.e. constant variance [l3|, and the Poisson case ^3|], the integrals in Eq. ^ are trivial in the former case, 
and no explicit expression for the optimal tuning curve for arbitrary stimuli was given in the latter. 

Although Eq. ^ is known to maximize Eq. ([31), it has also not been recognized that Eq. ^ can be rewritten as 

/(a;,y) =0.51og2 ( ^\ - D{f,\\fj) + k}, (6) 
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where -D(-||-) represents the relative entropy (KuUback-Leibler divergence) Q between the distributions with PDFs 
fx and fj [24]. Since relative entropy is always non- negative, the mutual information is maximized when fx ~ f.j. As 
well as a new way of verifying the optimality of Jeffrey's prior, Eq. ([6]) allows calculation of the reduction in mutual 
information when the tuning curve and the stimulus distribution are not optimally matched. 
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Another unappreciated consequence of maximizing the mutual information is that regardless of whether the stimulus 
is optimized for a given sigmoidal tuning curve, or vice versa, the resulting Fisher information can be written as a 
function of the stimulus PDF, 

r{x) = ky,{x)\ (7) 

It is stated in [l^l that constant Fisher information provides Fisher-optimal neural codes. From Eq. ([7]), the Fisher 



information at Shannon-optimality is constant iff the stimulus is uniformly distributed. The discussion in 1J| relates 
to the mean square error (MSE) between a stimulus and a neural response, rather than the mutual information. 
We therefore conclude that while a uniform stimulus with the corresponding Shannon-optimal tuning curve will 
provide the minimum MSE out of all stimulus distributions, that otherwise constant Fisher information and Shannon 
optimality do not coincide. 

Further to this, the Cramer- Rao bound states that the reciprocal of the Fisher information provides a lower bound 
on achievable conditional MSE estimates of a; 0] . The expected value of this is a lower bound on the MSE between 
X and any estimator for x derived from the mean firing rate y. If this lower bound is asymptotically achievable, e.g. 
by requiring a large number of observations, or s — > 0, then it is known as the minimum asymptotic square error 
(MASE) [1^]. From Eq. ([7]), the MASE when the stimulus and tuning curve jointly maximize I{x,y) is 

MASE" = ^/.W-^...^^-^..^ (8) 

Clearly, if fx{x) has long tails, the integral in Eq. ([S]) may diverge, which indicates the MASE is not achievable by 
any estimator and that maximizing mutual information and minimizing MASE are not equivalent. 

The general observations above are now illustrated and verified for a specific example where the variance and mean 
are related quadratically as 

V{x) ^ s'^T{x){l-T{x)). (9) 
The integrals in Eq. ([5]) can be solved for this relationship and several examples where it holds have appeared in the 



experimental neural literature 18|. We find that kj = ny/k^/s, and hence the optimal stimulus PDF is 



f^ix) = , ^'^""^ =. (10) 

7T^T{X){1~T{X)) 

Integrated and inverting Eq. (|10[) leads to the optimal tuning curve, 

r°(a;) = 0.5-0.5cos(7r^:r(x)), (11) 
where F^^-) is the CDF of the stimulus. The resultant maximum mutual information is 

/°(x,y)=O.51o&(^0)-|-fc^^ (12) 

In comparison, for the Poisson case V{x) = s'^T{x), the optimal tuning curve is T°{x) = F^{x), and the maximum 
mutual information is reduced by 0.51og2 {tt/2). 

Eq. PT|) is plotted for several stimulus distribution examples in Fig. [21 while Eq. PU)) is plotted for several tuning 
curves in Fig. [31 The most likely values of the optimal stimulus are not necessary close to the mean. For example, the 



optimal stimulus for a linear tuning curve has an arcsine distribution, which has a {/-shaped PDF (Fig. 3(b) middle 
plot), while for a hyperbolic tangent tuning curve, the optimal PDF is the bell-shaped hyperbolic secent distribution 
(Fig. |3(b)[ left-most plot). 

From Eq. (jl2p . the maximum mutual information increases logarithmically with decreasing s. To illustrate the 
validity of this result for the example of Eq. Fig. [H shows the exact mutual information calculated numerically 
for the model of Eq. (|T]), as a function of s, with the tuning curve and stimulus optimally matched, and ^ Gaussian. 
Also shown is the mutual information of Eq. p2)) . and the percentage error between the two cases. Clearly Eq. (fT2)) 
forms a lower bound to the actual mutual information, as discussed in [l3|, while the error falls to less than 1% for 
s < 0.04. 

We now use Eq. ^ to verify our observations about the differences between Shannon and Fisher optimality. If 
the stimulus is uniform on [0,a] then the Fisher information is constant. From Eq. ([7]), the MASE for the Shannon 
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optimal tuning curve is MASE° = (as)"^ /{n'^k^). On the other hand, if the tuning curve is T{x) = x/a on [0, a], the 
Shannon-optimal stimulus has an arcsine distribution on the same interval, the Fisher information is non constant, 
and MASE2 = (as)^/(8fc^). It is clear that MASEj > MASE°, which agrees with constant Fisher information being 
Shannon optimal only for uniformly distributed stimuli. Indeed, when the stimulus is non-uniformly distributed, 
different classes of optimal tuning curves to Eq. (fTT|) might result if the objective was to minimize the MSE instead 
of maximizing I(x, y). 

In closing, if the assumption that s is small is violated, Eq. ([6]) provides a lower bound to the true mutual information 
achieved for a given stimulus and tuning curve. How different the optimal tuning curve may be for a given stimulus in 
the event that s is not small is an open question. Based on preliminary numerical calculations [25| . we conjecture that 
the optimal tuning curve for s ^ 1 is composed of a large number of discrete jumps, rather than a smooth increase, 
which converges to T°{x) as s ^ 0. This observation is supported by somewhat related calculations in 0, [H, [20| . 
Future work will address other examples of non-Poisson variability, and consider spontaneous firing and relative 
refractoriness. 
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FIG. 1: Normalized variance of firing rate V{x) as a function of normalized mean firing rate T{x) for stimulus x, and a maximum 
of = 5 spikes: (i) the solid line is the sub-Poisson example considered in this paper, i.e. Eq. ([9]) with s'^ = 0.2; (ii) the dashed 
line shows the Poisson case where the un-normalized mean is equal to the variance; (iii) the dotted line shows the minimum 



variance case for s 



0.2. 
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(a) Stimulus PDFs compared with V°(x) and T°'(x). 
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(b) Three optimal tuning curve T°(x) for fx{x) in (a). 



FIG. 2: (a) PDFs of three stimulus distributions (solid lines), compared with the derivative of the derived optimal tuning curve 
T°'{x) (dashed line), and the optimal variability V°{x) (dotted line). Each has a different mean to illustrate that the mean is 
not significant, (b) Derived optimal sigmoidal tuning curves (normalized mean firing rate) against stimulus intensity for the 
three distributions shown in (a). Dotted lines show T°{x)ztV°{x) (from Eq. ([9}) with s = 1. This s has been chosen to be very 
large so that the stimulus dependent variability is clear. 
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(a) Three sigmoidal tuning curves, T{x). 
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(b) Optimal stimulus PDFs compared to V{x), T'{x). 



FIG. 3: (a) Three sigmoidal tuning curves (normalized mean firing rate) against stimulus intensity. As in Fig. 2(b) dotted 
lines show T°{x)±V°{x) with s = 1. (b) The optimal stimulus PDF for each tuning curve (solid lines), compared with the 
derivative of the tuning curve (dashed line), and the variability V°{x) (dotted line). 




FIG. 4: Comparison between the exact mutual information, I{x,y) for Eqs ^ and ([9]) and the derived mutual information 
(Eq. (|12|) ') for an optimally matched stimulus and tuning curve, for as a function of s. 



